WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 



PCT 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 6 : 
G06F 7/00, 15/16 



Al 



(11) International Publication Number: 
(43) International Publication Date: 



WO 00/26762 

11 May 2000 (11.05.00) 



(21) International Application Number: PCT/US99/25504 

(22) International Filing Date: 29 October 1999 (29.10.99) 



(30) Priority Data: 

09/183,603 



30 October 1998 (30.10.98) 



US 



(71) Applicant: SEARCHGENIE.COM, INC. [US/US]; 399 E. 10th 

Avenue, Suite 110. Eugene, OR 97401 (US). 

(72) Inventor: RICHANBACH, Mark; 877 Fairway View Drive, 

Eugene, OR 97401 (US). 

(74) Agent: HEILBRUNN, Elise, R.; Beyer & Weaver, LLP, P.O. 
Box 61590, Palo Alto, CA 94306 (US). 



(81) Designated States: JP, European patent (AT, BE, CH, CY, DE, 
DK, ES, H, FR, GB, GR, IE, IT, LU f MC, NL, PT, SE). 



Published 

With international search report. 



(54) Title: INTERNET SEARCHING SYSTEM 
(57) Abstract 

A system for information retrieval on the internet requests the information 
providers, such as web site administrators, to provide characterizing data in the 
form of questions a user may ask in looking for the information (314). The 
characterizing data (404) is stored in a database along with the destination data 
(406) that indicates where the information may be found on the internet. An 
information seeker may then enter a query in the form of a natural language 
question (204). The question in the query is then matched against questions 
stored in the database (608). If there is a match, the associated destination data 
is employed to retrieve information for the information seeker (610). Both the 
information seeker and the information providers may also furnish filter values 
(408, 410) to filter the information retrieved in order to allow the system to 
provide only the most relevant information to the information seeker. 
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BACKGROUND OF THE INVENTION 



10 



1. Field of the Invention 

The present invention relates to information retrieval systems. More 
particularly, the present invention relates to information retrieval systems and 
-methods therefor that harness the multi-user, open-ended nature of the Internet to 
minimize the costs associated wiii^_oi^ in .i. ric/5Uch systems while allowing 
information seekers to find the desired information in an accurate and timely manner. 



2. Description of the Related Art 

Despite its recent origin, the Internet has rapidly become an important source 
of information for individuals and businesses. The popularity of the Internet as an 
information source is due, in part, to the vast amount of available information that can 
15 be downloaded by almost anyone having access to a computer and a modem. The 

Internet's strength also lies in its open-ended nature. That is, the Internet is not 
supervised or controlled by any person or entity, and anyone having some elementary 
Internet skills can create and own a web site for the purpose of publishing information 
thereto. These and other factors have caused an exponential increase in Internet usage 
20 and with it, an exponential increase in the volume of information available. 

Unfortunately, the overwhelming amount of information available on the 
Internet also presents formidable challenges to users who wish to rapidly and 
accurately locate relevant information on the ever-expanding and ever-changing 
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Internet. To help users access the information available on the Internet, many 

different information retrieval techniques have been developed. By way of example, 
databases have been developed by entities known as search engine companies, which 
typically employ a large number of people to access, review and categorize the vast 
number of web pages and web sites on the Internet to facilitate searches by web users. 
Once the database is built, an Internet user may log on to the search engine's web site 
and employ a suitable search front end, or user interface, to search through the 
database in order to identify the catalogued web page(s) or web site(s), which were 
placed into the database in advance by employef^T^ — ^Tanrr^ss^TT^ 
mentioned,»rarJier. -m^fcZz sense, the databases created and maintained by the search 
engine companies function much like the familiar "Yellow Pages" phone books, albeit 
in electronic form. 

To facilitate discussion, the search front end or user interface portion of an 
exemplary search engine known as Excite!™, which is available on the Internet, is 

15 shown in FIG. 1. The user interface, such as that shown in FIG. 1, is typically created 

by the search engine company, and may be accessed by the information seeker by 
logging into a designated web site (e.g., the Excite!™ home page). Through the user 
interface of FIG. 1, the information seeker may enter an appropriate query to allow 
the search engine to search through the database (not shown) for the purpose of 

20 finding the web pages or web sites that contain the information specified by the 

entered query. 

Referring first to FIG. 1 , a user may enter in block 102 a query containing key 
words that best represent the information sought. By way of example, the user may 
enter the phrase "merced microprocessor" to find, for example, a local dealer for 
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computers that employ this Intel-based microprocessor. Depending on how the query 
is entered, the search engine then parses the query in order to search, through the 
search engines existing database. In the example of FIG. 1, the phrase "merced 
microprocessor" in block 102 may be parsed such that the word 4k merced" creates a 
5 first set of hits while the word "microprocessor" creates a second set of hits. In the 

absence of any Boolean operator, as is in the case of FIG. 1, it may be assumed, for 
example, that the user wishes to find all web pages or web sites that contain both the 
term "merced" and the term "microprocessor." Using Boolean operators in search 
queries is conventional and will not be discussed in great detail here for brevity's 
10 sake. 

FIG. 1 also illustrates exemplary search results that may be obtained by the 
Excite!™ search engine from the supplied query "merced microprocessor." As 
shown, the result of a typical search may include an indication of the number of "hits" 
(i.e., web sites or web pages that fit or approximately fit the criteria specified by the 
15 entered query), a list of web sites that may be deemed by the search engine to be most 

relevant to the query, and a brief description of each of the web sites displayed. 

As shown, the user-submitted query, "merced microprocessor," yields 51871 
hits. While seemingly high, this large number of hits is not at all unusual nowadays 
given the vast, global nature of the Internet and the ease with which web pages and 
20 web sites may be added thereto or modified therefrom. In fact, as search engine 

databases attempt to be as inclusive as possible, the number of hits is likely to 
increase, not decrease, in the future for a given query. 

At some point, the sheer number of hits returned renders the search results less 
than useful. By way of example, although the exact web site that contains the 
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information sought may be found somewhere in the 51,871 web sites and web pages 

returned in the search results, that relevant web site is essentially buried and may be 

difficult if not impossible to find in any reasonable amount of time. In fact, a typical 

user would not and perhaps could not, given time and network constraints, download 

5 all possible "hits" to inspect for possible relevance. 

In order to alleviate the above searching problem, various search engines 

available on the Internet may categorize web sites through the use of predefined 

categories and/or the use of an independent agent. By way of example r one such 

agent may be programmed to crawl through the web pages of a site to search for 

10 repeating terms. Thus, a web site that mentions the phrase "merced microprocessor" 

ten times may be deemed more relevant by the agent than a web site that merely 
mentions that phrase once in passing. This data may be stored in the web engine's 
database and may be employed to sort the hits returned in the search results in order 
to give the user some information pertaining to the possible relevance of the web sites 

15 found by the search engine. 

There are, however, disadvantages associated with current Internet 
information retrieval systems. By way of example, most current Internet search 
engines accept queries in the form of search terms, which may be qualified by the use 
of Boolean operators if additional specificity is desired. In the past, such a querying 

20 technique was readily understandable to the technically-oriented few who accessed 

the Internet. Nowadays, however, the Internet may be accessed by people from all 
walks of life, some of whom may have little or no training in computer searching 
methodologies. Accordingly, the requirement that searches be conducted by 
specifying search terms linked by Boolean operators represents a significant obstacle 
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to Internet usage. 

Further, it is questionable whether there is a direct correspondence between 
the frequency of usage of a term by a web site and the relevance of that web site to the 
concept represented by the term. At best, it is an educated guess, albeit a poor one, 
about the relevance of a particular web site. Note that even though the relevance of a 
particular web site or web page to a particular concept, term, or keyword may be 
clearly understood by the information provider (e.g., the creators or administrators of 
the web sites or web pages), the knowledge of such information providers is not 
leveraged in any meaningful way by current search engines in determining a web 
site's relevance during a search. By failing to leverage the knowledge of information 
providers in ascertaining the relevance of the information found, current Internet 
information retrieval systems continue to return a high number of false hits or return 
hits that may hav* little or no relevance to the information seeker's need. 

Even if there is some correspondence between the frequency of usage of a 
term by a web site and the relevance of that web site to the concept represented by the 
term, ranking web sites by the frequency with which a particular keyword is 
mentioned unfortunately encourages "word stuffing." Word stuffing refers to the 
practice by which web page creators randomly repeat keywords in various locations 
in the document solely for the purpose of ensuring a high ranking in a search result. 
The temptation to engage in "word stuffing" may be particularly great for information 
providers of commercial, for-profit web sites since the revenues derived from those 
web sites may be tied, either directly or indirectly, to the ability of web users to 
rapidly locate and access the web sites for information and/or purchases. As the 
practice of word stuffing becomes more prevalent, the ranking returned in the search 
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becomes meaningless as such ranking no longer reflects the true frequency by which a 

particular keyword is honestly employed in the text of the web site. 

There is, however, an even greater problem with current Internet information 
retrieval systems. As the Internet grows and evolves, the number of web sites and 
5 web pages that exist has grown exponentially. At the same time, established web 

sites and web pages do not stay static and unchanged. Instead, the existing web sites 
and web pages are modified continually by their owners (i.e., the individuals and 
businesses that operate the web sites) as the information that need to be 
communicated changes. These dual problems, coupled with the open nature of the 
10 Internet, render it difficult for the current information retrieval model, which relies on 

efforts of the web search engine employees and resources to keep the database 
updated, to stay current. 

For one, newly created web sites may go unnoticed by an Internet search 
engine for a long time. An Internet web site or web page may be "missed" by an 
15 Internet search engine because it is difficult to access, or because it was created in 

between crawls by the Internet search engine. Accordingly, the information that is 
contained in that newly created web site may remain inaccessible to web users until 
the new web site is "discovered" by the Internet search engine and is included in the 
database for searching. 
20 Furthermore, even if a web site is already included in the database for 

searching, changes to the web site may be missed by an Internet search engine for 
quite a long time, rendering the search result inaccurate. This is because, as a 
practical matter, Internet search engines have only finite resources, in terms of people 
and computing power, to cycle through the web sites and web pages of the Internet to 
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update its database. In between crawls, the content of a web site may be changed or a 

web site may be removed by the information provider. However, due to the manner 

with which databases are currently created and updated by Internet search engines, 

such changes may go unnoticed for as long as nearly the entire cycle time. 

In view of the foregoing, there are desired Internet information retrieval 

systems and methods therefor that permit information seekers to find the desired 

information in an accurate and timely manner while minimizing the costs associated 

with maintaining such systems. 



1° SUMMARY OF THE INVENTION 



The present inversion relates, in one embodiment, to an information retrieval 
system and methods therefor that may be employed to search for information on the 
internet. The information retrieval system permits the multitudes of information 

15 providers, such as web site administrators, to submit characterizing data that 

characterizes the information to be found, which may be a web site, a web page, a 
specific portion of a web page, or a file containing information. In one embodiment, 
the characterizing data include questions that the information provider predicts a 
typical information seeker may ask when searching for the information. The 

20 information providers also provide destination data regarding where the information 

may be obtained, e.g., the URL path to the actual location where the information 
resides. The characterizing data for the various destinations are then stored in a 
database, along with the associated destination data. Note that since the information 
providers are the people who are in control of the information content, leveraging the 

25 information providers in this manner ensures that the database is up-to-date and 

accurate. 
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To search for information, the information seeker enters a query, which includes 
a question in one embodiment. The question in the query is then matched in the 
database against the questions in the database to find a correspondence or a match. If 
there is such a correspondence, the associated destination data is employed to retrieve 
the information from the internet to provide an "answer" to the information seeker. 
Both the information seeker and the information providers may also furnish filter 
values to filter the information retrieved in order to allow the system to provide only 
the most relevant information to the information seeker, which is determined in 
accordance with the filter criteria. 

These and other features of the present invention will be described in more 
detail below in the detailed description of the invention and in conjunction with the 
following figures. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a screen shot illustrating an exemplary search query which may be 
employed to search a conventional Internet search engine (such as Excite!™) and 
exemplary search results that may be obtained therefrom 

FIG. 2A is an exemplary user interface that permits an information seeker to 
enter one or more questions that may be used to search and retrieve information from 
the Internet according to one embodiment of the present invention. 

FIG. 2B is an exemplary user interface that permits an information seeker to 
enter a filter selection in order to filter information retrieved from the Internet 
according to one embodiment of the present invention. 

FIG. 3 A is an exemplary interface that permits an information provider to 
enter one or more questions associated with information to be made available on the 
Internet for retrieval by an information seeker according to one embodiment of the 
present invention. 
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FIG. 3B is an exemplary interface that permits an information provider to 
enter one or more filter values designed to permit information submitted by the 
information provider to be filtered upon retrieval by an information seeker according 
to one embodiment of the present invention. 

FIG. 4 is an exemplary database in which information submitted by an 
information provider may be stored according to one embodiment of the present 
invention. 

FIG. 5 is a diagram illustrating exemplary questions submitted by an 
information seeker upon parsing of the questions into search terms. 

FIG, 6 is a flow diagram illustrating a method for searching the database 
according to one embodiment of the present invention. 

FIG. 7 is a flow diagram illustrating a method for searching the database for 
entries containing the search elements as shown in step 608 of FIG. 6 according to 
one embodiment of the invention. 

FIG. 8 is a diagram illustrating potential word equivalencies which may be 
applied during the method for searching illustrated in FIG. 6. 



DETAILED DESCRIPTION OF THE INVENTION 

In the following description, numerous specific details are set forth in order to 
provide a thorough understanding of the present invention. It will be obvious, 
however, to one skilled in the art, that the present invention may be practiced without 
some or all of these specific details. In other instances, well known process steps 
have not been described in detail in order not to unnecessarily obscure the present 
invention. 

An invention is described herein that provides an information retrieval system 
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that may be applied on the Internet. As will be described in ftirther detail, the 

invention leverages on the multi-user, open-ended nature of the Internet by allowing 

those most knowledgeable about the content of a web site or a web page, i.e.. the 

information providers such as the creators or administrators of web pages or web 

sites, to characterize the content of that web site or web page for search purposes. 

Information providers will hereinafter refer to any person or entity in control of 

information desired to be made available for retrieval on the Internet. In one 

embodiment, the characterizing data includes a question or questions which a typical 

information seeker may ask when trying to locate the target web page or web site. 

Note that the invention does not require that any single person or entity have 
knowledge or control regarding the content or even existence of all web sites that 
want to be found. In fact, such knowledge or control is impossible on an open-ended 
system such as the Internet. Because the invention requests that the information 
providers themselves (e.g., the creators or administrators of web pages or web sites) 
provide the characterizing information, the invention advantageously leverages the 
multi-user nature of the Internet to maintain the database. 

It is reasoned by the inventors herein that in the context of the Internet, 
information providers such as businesses have a strong incentive to want to be found. 
In fact, some businesses derive a significant portion of their revenue from Internet 
traffic and thus have a strong incentive to keep the characterizing data updated in the 
database for users to quickly access their web sites. Accordingly, unlike in a closed- 
ended system such as a proprietary network in a company where there is little 
incentive, financial or otherwise, for the information providers to keep the database 
updated, the Internet paradigm renders itjwssible to rely on information providers for 
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timely updates of the characterizing data for search purposes. 

In fact, given the volume of data available on the Internet nowadays, it would 
be highly impractical to rely on a single administrator or group of administrators at 
the Internet search engine company to track and characterize the content of all 
existing web sites and web pages. Thus the Internet paradigm makes it imperative, 
even necessary, that this characterizing data comes from the information providers 
themselves. This is particularly true considering the fact that it is always the 
information provider who is the first to know whether the information in his or her 
own web site/page has been changed. 

Even if there are enough resources for someone other than the information 
providers themselves to continually crawl all the web sites and web pages to update 
the characterizing data in the database substantially instantaneously as changes occur, 
there is still a substantial risk thrt some web pages would be "missed" during crawls 
if such web pages are difficult to access (e.g., because of a proprietary access 
interface, a poorly designed and/or convoluted access path from the home page, or the 
like). By allowing the information providers themselves to supply the characterizing 
data and the identity of the web site/page in the database, such access issue is 
substantially eliminated. 

In accordance with one aspect of the present invention, an information seeker 
(i.e., Internet user) may enter a natural language query to search for these web sites. 
In one embodiment, the present invention permits a user to search the database 
through the use of a plain language query in the form of one or more questions. The 
information retrieval system then matches the user-entered questions to equivalent or 
similar questions that were supplied by the information providers earlier in the 
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database to ascertain the identity (and therefore access path) of the relevant 

destinations (i.e., web sites, web pages, targets within specific web pages, or files) 
that contain the information sought. 

In addition, the information providers may also include in the characterizing 
5 data filtering information that characterizes a destination by its level of sophistication 

(e.g., complexity or level of education). This filtering information further 
characterizes the web site or web page since it specifies the sophistication of the 
target audience of the information provided. Again, requesting the information 
providers themselves to provide this information leverages the multi-user, open-ended 

10 nature of the Internet. If desired, the information seeker may supply, in addition to 

the query that is used to search for all web sites that has the desired content, filtering 
data. The filtering data supplied by the information seeker may then be matched 
against the filter information previously supplied by the information providers to 
ensure that a given web site not only satisfies the content criteria (specified in the 

15 query, which is in the form of a question or questions in one embodiment) but also 

satisfies the complexity criteria. Thus, the web sites or web pages returned to a 
consumer looking for information on the Intel-based Merced™ microprocessor to 
make a computer purchase decision would be different from the web sites or web 
pages returned to an electronic circuit designer looking to synchronize his designed 

20 circuit with the timing requirements of the Merced™ microprocessor (although in 

terms of content, both deal with the same microprocessor!). 

The features and advantages of the present invention may be better understood 
with the reference to the figures and discussions that follow. FIG. 2A is an exemplary 
user interface that permits a user to enter one or more questions that may be used to 
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search and retrieve information from the Internet according to one embodiment of the 

present invention. As shown, a prompt 202 may be provided which requests that 

questions submitted be in plain English. One or more questions may then be entered 

by an Internet user. Each question 204 indicates information desired to be retrieved 

from the Internet. By way of example, a first question 204-1 requests information 

relating to "Who fixes or repairs heat pumps in Eugene, OR?" To assist the user in 

this process, a pull-down menu may be provided which allows the user to select from 

the most recently entered questions. Moreover, the searching process is simplified 

since the user may request information through the use of questions rather than 

boolean search terms. 

A person asking a question in plain English will typically ask a limited 
number of types of questions. As shown in FIG. 2A, each question typically includes 
what may be referred to as a question prefix 206. The question prefix 206 may be one 
of a number of terms such as "who", "what", "when", "why", "where", and "how". 
By way of example, for the first question 204-1, a first question prefix 206-1 may be 
"Who." The question prefix 206 for each question submitted by the user may be 
provided as part of the interface as shown in FIG. 2A. Alternatively, the question 
prefix 206 may be submitted by the user as part of the question 204. The user may 
enter any number of questions for a given question prefix. For example, if a user 
wishes to find a car repair shop in Palo Alto, the user may enter questions such as 
"Who repairs cars in Palo Alto?" or "Who repairs Mercedes in Palo Alto?" 
Therefore, various question prefixes may be more desirable than others for various 
purposes. Moreover, as shown, questions provided by the user may be unrelated as 
well as related. 
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The information and characteristics of the information that is provided on the 

Internet varies from web site to web site, as well as within each web site. As 

described above, a search may yield search results that are incompatible with the 

needs of the Internet user although technically speaking, the content of the 

information is relevant to the query. An example of this is when a technically 

unsophisticated consumer is furnished with highly technical information pertaining to 

the arithmetic logic unit (ALU) of the Intel-based Merced™ microprocessor in 

response to the question "what is a merced microprocessor.?" The user may therefore 

wish to further filter the search results that may be obtained through the use of the 

previously entered questions. 

FIG. 2B is an exemplary user interface that permits a user to enter a filter 
selection in order to filter information retrieved from the Internet according to one 
embodiment of the present invention. The optional filter selection may be used for a 
variety of purposes to filter the information that is retrieved. By way of example, the 
filter selection may be used to specify one or more levels of complexity for the 
information that is ultimately retrieved. As yet another example, the filter selection 
may be used to specify one or more educational levels for which the information 
retrieved will be most appropriate. Thus, children in grammar school searching for 
information may obtain an "answer" to their question that is most suitable for their 
purposes. Similarly, a graduate student doing research on the same or a similar topic 
may obtain information on that topic of a higher level of detail and complexity. 

As shown in FIG. 2B, a first exemplary filter selection 208 designed to filter 
the information desired to be retrieved is illustrated. The first filter selection 208 
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permits the user to specify at least one educational level associated with the 

information desired to be retrieved. As shown, one or more educational levels 210 

from which to select the first filter selection 208 may be provided to the user. By way 

of example, the educational levels 210 may include pre-kindergarten, kindergarten - 

5 ,h grade, 6 th grade - 12 lh grade, college, post collegiate, or all of the above. By way 

of example, the educational levels 210 provided may be used to indicate the user's 

ability or inability to read English. The user may then specify one or more selections 

212 from these educational levels 210. By way of example, the user may mark a box 

corresponding to the appropriate educational levels 210. 

In addition, a second exemplary filter selection 214 designed to filter the 

information desired to retrieved is shown. The second filter selection 214 permits the 

user to specify at least one complexity, or technical, level associated with the 

information desired to be retrieved. As shown, one or more complexity or technical 

levels 216 from which to select the second filter selection 214 may be provided to the 

user. By way of example, the complexity or technical levels 216 may include very 

easy, easy, average, complex or technical, very complex or technical, or include all 

levels. By way of example, for a user searching for information on "nurseries", the 

level "very easy" may yield information related to children's nurseries rather than 

gardens. Thus, these complexity or technical levels 216 may be associated with the 

subject matter of the information being retrieved. The user may then specify one or 

more selections 218 from these complexity or technical levels 216. As described 

above, the user may designate one or more selections corresponding to the desired 

levels 216 using a mouse or other device. Moreover, such filter selections may be 

exclusive as well as inclusive. Similarly, expressions such as boolean or 
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mathematical expressions may be used during the filtering process. By way of 

example, boolean expressions may be applied to select information that is appropriate 

for both a 6 th grader AND a college graduate. As yet another example, mathematical 

expressions may be applied to select information that is appropriate for a 6 th grader or 

someone who is LESS educated. Although only two exemplary filter selections are 

described above, further options may be provided to the user such as those which 

would specifically permit access to, or deny access to, pornographic material. 

Filter selections may be used for a variety of purposes. For instance, such a 
filtering mechanism may be used to provide parents with the ability to set these filter 
selections such that pornographic or other inappropriate material cannot be accessed 
by their children. By way of example, this may be accomplished through providing a 
separate user interface which allows the filter selections to be set. As yet another 
example, the filter selections may be updated through the use of a password or other 
protected mechanism. Therefore, the age appropriateness and complexity of the 
content of various web sites or information made available on the Internet may be 
filtered according to the user's specifications. In addition, further filtering 
mechanisms may be applied to obtain different categories of information (e.g., states, 
distance from the user). Appropriate links, therefore, may be performed to obtain and 
access additional information required for the particular category (e.g., mapping 
programs). Accordingly, the search may be tailored to the specific needs of the 
Internet user. 

Each question entered by the user and associated filter selections may then be 
accepted upon submittal of the query 220 by the user. A database containing 
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information submitted by one or more information providers may then be searched for 

information related to the submitted query. Upon completion of the search, an 

"answer" to the question is provided to the user. 

As described above, it would be unwieldy to search through all web sites on 
the Internet. Therefore, a database is maintained by a central service that provides 
access to the present invention. Rather than contain information related to all web 
sites on the Internet, this database contains entries for only those web sites or 
"targets" (i.e., destinations) submitted by information providers. More particularly, 
these entries associate a web site or "target" with one or more questions submitted by 
an information provider. It is important to recognize that the information provider is 
in the best position to judge the content (e.g., complexity) of a particular web site or 
target. Moreover, the information provider would have an interest in assisting an 
Internet user in locating the particular web site. In addition, the information provider 
is in the best position to predict the type of question the average consumer might ask. 
Therefore, the information provider (i.e., creator of the web site) may create 
appropriate questions and filter values characterizing the web site or information 
submitted to the service. These questions, filter values, and the corresponding 
destination (e.g., web site) may then be furnished by the information provider to the 
service for entry into the system database. 

FIG. 3A is an exemplary interface that permits an information provider to 
enter one or more questions associated with information to be made available on the 
Internet for retrieval by a user according to one embodiment of the present invention. 
As shown, a prompt 302 is provided which requests that the information provider 
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submit questions characterizing a particular destination 304. As described above, the 

destination 304 contains an answer to those questions provided by the information 

provider. The destination 304 may therefore serve as a potential target for the search 

performed by the Internet user. The destination 304 may include a URL for a web 

site 306, a web page or file 308, a target or position within the web page 3 1 0 (e.g., 

paragraph), or a file containing contact information 312 which may be entered in a 

scrolling window, as shown. One or more questions may be entered by the 

information provider for each destination. Each question 314 may therefore be 

associated with information desired to be made available on the Internet for retrieval. 

By way of example, a first question 314-1 correlates with information relating to 

"Who fixes or repairs heat pumps in Eugene, OR?" Accordingly, information 

providers may furnish questions that are most likely to be asked by a user while 

searching the Internet. Moreover, multiple vacations of the same question may be 

supplied to increase the probability that the search will be successful. Similarly, 

multiple questions may be used to cover varying scopes of the same question. By 

way of example, a user may ask "What are the best restaurants in the Bay Area?" as 

well as "What are the best restaurants in San Francisco?" 

As shown in FIG. 3 A, each question may include a question prefix 316. The 

question prefix 316 may be one of terms such as "who", "what", "when", "why", 

"where", and "how". By way of example, for the first question 3 14-1, a first question 

prefix 316-1 may be "Who." The question prefix 316 for each question submitted by 

the information provider may be provided as part of the interface as shown in FIG. 

3 A. Alternatively, the question prefix 316 may be submitted by the information 

provider as part of the question 314. The information provider may enter any number 
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of questions for a given question prefix. For example, as shown, if an information 

provider wishes to make information related to heat pumps available on the Internet, a 

variety of questions associated with such information may be entered and linked to 

this information through the use of the present invention. 

The information provider may enter one or more optional filter values to 
designate the appropriate audience for the particular destination provided by the 
information provider. As described above, these corresponding filter values may be 
exclusive as well as inclusive. In this manner, the information may be filtered upon 
retrieval by a user upon the matching one or more of the questions submitted by the 
information provider. FIG. 3B is an exemplary interface that permits an information 
provider to enter one or more filter values designed to permit information submitted 
by the information provider to be filtered upon retrieval by a user according to one 
embodiment of the present invention. More particularly, filter values submitted by 
the information provider are designed to filter the information according to the user 
provided filter selection. By way of example, a filter value may be used to specify 
one or more levels of complexity (e.g., technical levels) that characterize the 
information that is submitted by the information provider. As yet another example, a 
filter value may be used to specify one or more educational levels to indicate the age 
or educational level for which the information will be most appropriate. 

As shown in FIG. 3B, a first exemplary filter value 3 1 8 designed to 
characterize the information submitted by the information provider is illustrated. The 
first filter value 318 permits the information provider to specify at least one 
educational level associated with the information. As shown, one or more educational 
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levels 320 from which to select the first filter value 3 1 8 may be provided to the 

information provider. The information provider may then specify one or more 

selections 322 from these educational levels 320. 

In addition, a second exemplary filter value 324 designed to characterize the 
information submitted by the information provider and facilitate later retrieval of the 
information is shown. The second filter value 324 permits the information provider 
to specify at least one complexity, or technical, level associated with the information 
submitted. As shown, one or more complexity or technical levels 326 from which to 
select the second filter value 324 may be provided to the information provider. The 
information provider may then specify one or more selections 328 from these 
complexity or technical levels 326. As shown, the filter values specified by the 
information provider are selected from choices made available to the user upon 
specifying the corresponding filter selections, as described above. In this manner, 
information submitted to the search service by the information provider may be 
appropriately "categorized" according to these filter values to permit later retrieval by 
a user. 

Each question entered may then be associated with the filter values and the 
destination as specified by the information provider. FIG. 4 illustrates an exemplary 
database engine in which information submitted by the information provider may be 
stored according to one embodiment of the present invention. As shown, the database 
may include a plurality of entries. Each one of the plurality of entries 402 may store a 
question 404 submitted by the information provider, a destination 406 containing an 
"answer" to the question 404, and one or more optional filter values such as 
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complexity 408 and education 410. In this manner, the filter values 408, 410 and the 

destination 406 may be associated with the question 404. As described above, each 

question may include a question prefix. Since searching is performed through the 

database maintained by the searching service rather than through the entire Internet. 

search time is substantially less than that of standard Internet search engines. 

Moreover, since the database is a compilation of information submitted by the web 

site creators, the information is most likely to be retrieved in the manner desired by 

both the web site creators and the user requesting the information. 

Each information provider may submit its information to the Internet search 
service for retrieval by Internet users. A service fee may be charged upon submission 
of the information by the information provider or per kilobyte of memory required to 
store the entries in the database. Moreover, the service fee may be charged upon 
access of the information by an Internet user. 

FIG. 5 is a diagram illustrating exemplary questions submitted by a user upon 
parsing of the questions into search terms. By way of example, a question 502, "Who 
repairs Hondas in Palo Alto?" may be parsed into appropriate terms and phrases, as 
shown. The database may then be searched using these terms and phrases. 

Each question submitted by the user may be parsed and used to search a 
database for the appropriate information. FIG. 6 is a flow diagram illustrating a 
method for searching the database according to one embodiment of the present 
invention. The process begins at step 602. The question may be parsed into its 
question prefix at step 604 and a plurality of search elements at step 606. By way of 
example, each of the search elements may be parsed such that each term or phrase is 
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associated with an appropriate descriptor (e.g., verb or noun). The database may then 

be searched at step 608 for at least one entry associated with the question prefix and 

the plurality of search elements obtained in steps 604 and 606. To facilitate efficient 

searching and retrieval, the database may be a relational database which may be 

indexed prior to searching. In this manner, similar entries (e.g., questions) may be 

efficiently located. The selected entries may then be retrieved at step 610. The 

process is completed at step 612. 

One method for searching the database as shown in step 608 of FIG. 6 is 
illustrated in FIG. 7. As shown, FIG. 7 is a flow diagram illustrating a method for 
searching the database for entries, or questions, containing the search elements 
according to one embodiment of the invention. As shown, the method begins at step 
702. Entries having the desired parsed question prefix may be obtained from the 
database at step 704. A next one of the search elements may then be obtained from 
the parsed question at step 706. A search is then performed within the obtained 
entries for the next one of the search elements at step 708. At step 710, if it is 
determined that a search has been performed for all parsed search elements, the 
process is completed at step 712. Alternatively, if a search has not been performed 
for all parsed search elements, the process is repeated for each remaining parsed 
search element at step 706. Although the search is described as being performed 
consecutively for each parsed question prefix and search element, searches may be 
performed in parallel. During the search, the relevant entries may be ranked 
according to the number of associations between the search terms and the selected 
database entries. Similarly, irrelevant entries may be eliminated from consideration. 
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Once the appropriate entries are retrieved, the "answer" associated with 

question may be provided. Where a large number of entries are retrieved, a set of the 

retrieved entries may be selected. By way of example, it may be preferable to select a 

percentage of the entries. This selected set of entries may be selected according to 

various criteria such as relevance of the entries to the user provided question. The 

relevance of the entries may be determined through comparing the number of search 

terms that are the same or equivalent to the parsed terms in the user provided 

question. Thus, it may not be necessary to obtain an exact match between the 

question submitted by the information provider and the question entered by the user. 

Rather than submitting numerous variations of the same question, the 
information provider may wish to define one or more sets of equivalent terms. These 
equivalent terms may be applied during searching, retrieval of entries from the 
database, and ranking of the retrieved entries. FIG. 8 is a diagram illustrating 
potential word equivalencies. By way of example, when a parsed term, 
"information", is searched in the service database, the terms "data" and "news" may 
be interchangeable. In this manner, the question provided by the user need only be 
"equivalent" to a question provided by an information provider, rather than identical. 
By way of example, if a user wishes to obtain information relating to restaurants in 
the Midwest, "Midwest" may be associated with states in the Midwest to facilitate the 
searching process. Moreover, these equivalent terms may be assigned different values 
to permit ranking of the entries prior to providing the information to the Internet user. 

The present invention provides an accurate and efficient system for providing 
an Internet user with requested information. Since a user may enter a natural 
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language query in the form of a question, the system is user- friendly to a wide 

audience of computer users. Moreover, the knowledge and experience of the creator 

of the web site is most effectively leveraged to compile and maintain a system 

database such that subsequent searches are the most accurate and effective. As a 

result, "false hits" that typically occur during the use of standard Internet search 

engines may be substantially reduced through the use of the present invention. 

Therefore, it is unnecessary for the Internet search service to research or crawl the 

web sites that are submitted to the service. As a result, administrative support that 

must be provided by the service is minimized. Moreover, real-time updates may be 

made to the database to permit information to be efficiently and accurately retrieved. 

The invention can also be embodied as computer readable code on a computer 
readable medium. The computer readable medium is any data storage device that can 
store data which can thereafter be read by a computer system. Examples of the 
computer readable medium include read-only memory, random-access memory, CD- 
ROMs, magnetic tape, and optical data storage devices. 

Although illustrative embodiments and applications of this invention are 
shown and described herein, many variations and modifications are possible which 
remain within the concept, scope, and spirit of the invention, and these variations 
would become clear to those of ordinary skill in the art after perusal of this 
application. For instance, the present invention is described as permitting retrieved 
information to be filtered according to the complexity of the information as well as 
the selected educational levels. However, it should be understood that the present 
invention is not limited to this arrangement, but instead would equally apply 
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regardless of the categories in which the information is filtered. Accordingly, the 

present embodiments are to be considered as illustrative and not restrictive, and the 

invention is not to be limited to the details given herein, but may be modified within 

the scope and equivalents of the appended claims. 



BNSDOCID: <WO 0G26762A1_I_> 



-25- 



WO 00/26762 



CLAIMS 



PCT/US99/25504 



What is claimed is: 

1 A computer implemented method implemented in an information retrieval 

system for providing first information via the Internet to an information seeker, 
comprising: 

receiving a query from said information seeker via said Internet; 

comparing said query with a database of characterizing data entries, said 
characterizing data entries representing characterizing data items previously 
submitted to said information retrieval system by information providers for storing 
within said database, said information providers representing entities wishing to 
provide information through said Internet to Internet users, each of said characterizing 
data items being associated with at least one destination data item; and 

if a correspondence between said query and a first characterizing data entry of 
said characterizing data entries in said database is found, employing a first destination 
data item associated with said first characterizing data entry to provide said 
information seeker with said first information. 

2. The computer-implemented method of claim 1 wherein said information 
providers represent entities other than said information retrieval system. 

3. The computer- implemented method of claim 1 wherein said information 
providers includes entities, other than an entity implementing said computer- 
implemented method, that are responsible for updating contents of websites and 
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webpages containing information to be accessed by said Internet users. 



5 



4. The computer-implemented method of claim 1 wherein said information 
providers represent administrators of websites coupled to said Internet. 

5. The computer-implemented method of claim 4 wherein said websites are 
different from the website implementing said information retrieval system. 



6. The computer-implemented method of claim 1 wherein said first destination 
10 data item is a Uniform Resource Locator (URL) for a webpage. 

7. The computer-implemented method of claim 1 wherein said first destination 
data item is a Uniform Resource Locator (URL) pointing to a specific portion of a 
webpage. 

15 

8. The computer-implemented method of claim 1 wherein said first destination 
data item is a data file retrieved from a website external to a website implementing 
said computer-implemented method. 

20 9. The computer-implemented method of claim 1 wherein said first destination 

data item is associated with more than one of said characterizing data entries. 

10. The computer-implemented method of claim 1 wherein said query is in the 
form of a question. 
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1 1 . The computer-implemented method of claim 1 wherein said characterizing 
data entries include data entries in a question format. 

12. The computer-implemented method of claim 1 wherein said characterizing 
data entries are associated with filtering data. 

13. The computer-implemented method of claim 12 wherein said first 
characterizing data entry include entries pertaining to equivalent terms, said 
equivalent terms representing terms that are different but deemed by an information 
provider associated with said first characterizing data entry to be equivalent to a term 
in said first characterizing data entry, said equivalent terms causing said comparing to 
produce said correspondence if a term in said query matches one of sai I equivalent 
terms even if an exact match between said term in said query and said term in said 
first characterizing data entry is not found. 

14. The computer-implemented method of claim 12 wherein filtering data 
associated with a given one of said characterizing data entries includes data pertaining 
to a level of technical sophistication of information associated with a given 
destination data item, said given destination data item being associated with said 
given one of said characterizing data entries in said database. 

15. The computer-implemented method of claim 12 wherein filtering data 
associated with a given one of said characterizing data entries includes data pertaining 
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to a level of education appropriate for information associated with a given destination 

data item, said given destination data item being associated with said given one of 

said characterizing data entries in said database. 

16. A computer-implemented method for deriving revenue from Internet 
information providers responsive to queries by Internet users, comprising: 

receiving characterizing data entries submitted by said Internet information 
providers, each of said characterizing data entries correspond to at least one 
destination data item; 

storing said characterizing data entries in a database; 

receiving, via the Internet, a query from said information seeker, said 
information seeker being one of said Internet users; 

comparing said query against said characterizing data entries for a 
correspondence; and 

if said comparing produces a correspondence between said query and a first 
characterizing data entry of said characterizing data entries, charging a first Internet 
information provider of said Internet information providers a given amount, said first 
Internet information provider being associated with said first characterizing data 
entry. 

17. The computer-implemented method of claim 16 further comprising employing 
a first destination data item associated with said first characterizing data entry to 
provide information to said information seeker. 
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1 8. The computer-implemented method of claim 16 wherein said Internet 

information providers represent administrators of websites coupled to said Internet. 

19. The computer-implemented method of claim 16 wherein said first destination 
5 data item is a Uniform Resource Locator (URL) for a webpage. 

20. The computer-implemented method of claim 1 6 wherein said first destination 
data item is a Uniform Resource Locator (URL) pointing to a specific portion of a 
webpage. 

10 

21. The computer-implemented method of claim 16 wherein said first destination 
data item is a data file retrieved from a website external to a website implementing 
said computer-implemented method. 

15 22. The computer-implemented method of claim 16 wherein said first destination 

data item is associated with more than one of said characterizing data entries. 

23. The computer-implemented method of claim 16 wherein said query is in the 
form of a question. 

20 

24. The computer-implemented method of claim 16 wherein said characterizing 
data entries include data entries in a question format. 

25. The computer-implemented method of claim 24 wherein said first 
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characterizing data entry include entries pertaining to equivalent terms, said 

equivalent terms representing terms that are different but deemed by said first Internet 

information provider to be equivalent to a term in said first characterizing data entry, 

said equivalent terms causing said comparing to produce said correspondence if a 

5 term in said query matches one of said equivalent terms even if an exact match 

between said term in said query and said term in said first characterizing data entry is 

not found. 

26. The computer-implemented method of claim 16 wherein said characterizing 
10 data entries are associated with filtering data. 

27. An information retrieval system for providing first information via the Internet 
to an information eeker, comprising: 

means for receiving characterizing data entries and associated destination data 
items submitted by information providers, said information providers representing 
entities wishing to provide information through said Internet to Internet users; 

means for storing said characterizing data entries and said associated 
destination data items, each of said characterizing data items being associated with at 
least one of said associated destination data items; 

means for receiving a query from said information seeker via said Internet; 

and 

means for comparing said query with said characterizing data entries to find a 
correspondence between said query and a first characterizing data entry of said 
characterizing data entries. . . 
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28. The information retrieval system of claim 27 further including means for 
employing a first destination data item associated with said first characterizing data 
entry to provide said information seeker with said first information if there is a 

5 correspondence between said query and said first characterizing data entry of said 

characterizing data entries. 

29. The information retrieval system of claim 27 wherein said information 
providers represent administrators of commercial websites coupled to said Internet. 

0 

30. The information retrieval system of claim 29 wherein said websites are 
different from a website implementing said information retrieval system. 



3 1 . The information retrieval system of claim 27 wherein said first destination 
15 data item is a Uniform Resource Locator (URL) for a webpage. 

32. The information retrieval system of claim 27 wherein said first destination 
data item is a Uniform Resource Locator (URL) pointing to a specific portion of a 
webpage. 

20 

33. The information retrieval system of claim 27 wherein said first destination 
data item is a data file retrieved from a website external to a website implementing 
said information retrieval system. 
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34. The information retrieval system of claim 27 wherein said first destination 

data item is associated with more than one of said characterizing data entries. 



35. The information retrieval system of claim 27 wherein said query is in the form 
of a question. 

36. The information retrieval system of claim 27 wherein said characterizing data 
entries include data entries in a question format. 

37. The information retrieval system of claim 36 wherein said characterizing data 
entries are associated with filtering data. 

38. The informatic i retrieval system of claim 36 wherein said first characterizing 
data entry include entries pertaining to equivalent terms, said equivalent terms 
representing terms that are different but deemed by an information provider 
associated with said first characterizing data entry to be equivalent to a term in said 
first characterizing data entry, said equivalent terms causing said comparing to 
produce said correspondence if a term in said query matches one of said equivalent 
terms even if an exact match between said term in said query and said term in said 
first characterizing data entry is not found. 

39. The information retrieval system of claim 37 wherein filtering data associated 
with a given one of said characterizing data entries includes data pertaining to a level 
of technical sophistication of information associated with a given destination data 
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item, said given destination data item being associated with said given one of said 

characterizing data entries in said database. 



40. The information retrieval system of claim 37 wherein filtering data associated 
5 with a given one of said characterizing data entries includes data pertaining to a level 

of education appropriate for information associated with a given destination data item, 
said given destination data item being associated with said given one of said 
characterizing data entries in said database. 
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202 

Ask your question(s) below: IN PLAIN ENGLISH 
Fill in Any or All Queries. 



206 ^WHO:~ 206-1 



204- 



fixes or repairs heat pumps in Eugene, OR? 



204-1 



WHAT: 



is a Merced Microprocessor? 



WHEN: 



was digital cable offered in Eugene, OR? 



WHY: 



does my Hewlett Packard printer misfeed paper? 



WHERE: 



can I buy a used BMW in the Northwest? 



HOW: 



can I drill through metal? 



FIG. 2A 

SUBSTITUTE SHEET (RULE 26) 



WO 00/26762 



PCT/US99/25504 



3/9 



Select Age/Educational Level(s) 



210- 



Educational Level(s) 


Indicate Selections 


Pre - K 


X 


K - 5th Grade 


X 


6th - 12th Grade 




College 




Post Collegiate 




Include All 





208 



•212 



Select L evel(s) of Complexity 



216- 



Complexity or Technical Levels(s) 


Indicate Selections 


Very Easy 




Easy 


X 


Average 


X 


Complex or Technical 




Very Complex or Technical 




Include All Levels 





214 



218 



Submit Query 



•220 



FIG. 2B 
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302 

Please submit as many questions per category as you want for 
the following destination 

i 306 --Website: http://www.website.com 
308 -^Webpage: http://www.website.com/specificpage.html 
310 -^Target within webpage: 
http://www.website.com/specific page/#specif ictarget.html 
312 -^File/contact info. 

316-1^WHO:~316 

314 



314-1 



fixes or repairs heat pumps in Eugene, OR? 



WHAT: 



is a heat pump? 



WHEN: 



does a heat pump need to be serviced? 



WHY: 



does my heat pump make a rattling sound? 



WHERE: 



can I buy a heat pump in Eugene, OR? 



HOW: 



do I change a fuse on a heat pump? 



FIG. 3A 

SUBSTITUTE SHEET (RULE 26) 
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Select Age/Educational Level(s) 



326- 



Educational Level(s) 


Indicate Selections 


Pre-K 




K - 5th Grade 




6th - 12th Grade 




College 




Post Collegiate 




Include All 




Select Level(s) of Complexity ^ 


Complexity or Technical Levels(s) 


Indicate Selections 


Very Easy 




Easy 




Average 




Complex or Technical 




Very Complex or Technical 




Include All Levels 
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